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Abstract In this paper, we propose the V-index (or. Virtuosity index) as a novel 
metric to assess the scientific virtuosity of academics. This index can be applied to 
researchers and journals as well. In particular, we show that the V-index fills the gap 
of h-index and similar metrics in considering the self-citations of authors or journals. 
The paper provides with three real-world examples: in the first, we evaluate the re- 
search impact of the most productive scientists in Computer Science (according to 
DBLP); in the second, we assess the virtuosity of the journals ranked in the "Com- 
puter Science Applications" section of SCImago; in the last, we apply V-index for 
the assessment of the 201 1 research activity of 130 countries all over the world. 

Keywords Bibliometrics • Self-citations • /i-index 



1 Introduction 

Since its introduction lfT2l . the adoption of the h-index as an evaluation metric to as- 
sess the scientific impact of researchers, raised some concerns in the scientific com- 
munity. Prior to discuss them, to get some insights about the rationale behind the 
functioning of perhaps the most common research evaluation metric, we define it as 
inlfll: 

Definition 1 The index h is defined as the number of papers of a given author with a 
number of citations equal or greater than h. 
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In origin, the h-index has been devised to manage with a number of disadvantages 
brought by the adoption of single-number evaluation criteria, such as total number of 
published paper per author, total number of citations collected by an author or by 
her/his most representative publications, and so forth. 

Single-number criteria tend to ignore the significance of published papers, to fa- 
vor lower productivity or seniority of authors, or to randomly favor or disfavor au- 
thors, depending too much from arbitrary threshold parameters lfT2ll . 

Not only the h-index (and its several variants ll24l [Tll6ll25l) but also the ratio rep- 
resenting the number of citations per publication ||T6l can efficiently deal with these 
issues. On the other hand, both of them suffer from major drawbacks, emerged as 
effect of their large adoption and highlighted by current literature Il2ll5ll2()il . 

Regarding the citation per publication metric, it is simple to notice that it could 
be biased towards authors who publish a small amount of papers that acquire a lot 
of citations - which is not always an indicator of research impact. To this purpose, 
think about survey papers that seldom contribute to the state-of-the-art of a discipline 
but, counter-intuitively, tend to be highly-cited papers given the broader audience 
of readers to which they are directed. In addition, since bibliometric indicators are 
also applied to assess the impact of scientific journals, the adoption of a citation 
per publication criterium penalizes journals publishing more research products with 
respect to those publishing, for example, surveys or reviews. 

The motivations underlying the concerns relative to the adoption of the h-index 
are more subtle and profound. Some of the drawbacks discussed above do not apply to 
the h-index. For example, it can successfully deal with those authors who publish few 
highly-cited papers or with those journals publishing few broad interest (potentially 
highly-citable) works. 

Unfortunately, also the h-index is not exempt of drawbacks Il5ll27l[3]|. to name a 
few: f i) it does not take into account the number of authors per publication or its order; 
(ii) it does not consider the amount of years of activity of the given scholar/journal; 
(Hi) it does not take into consideration the scientific field of each given publica- 
tion/journal and, finally (iv) it can suffer from manipulation because of the self- 
citations. 

As previously stated, a vast number of variants of the h-index have been proposed 
during the latest years, to address the shortcomings of the original measure ll24l [Tll6l. 
For example, regarding the issue at point (i), hl-index f4l and hm-index |T9l represent 
an attempt to modify the original h-index taking into account the number of authors 
per paper. As for the shortcoming at point f ii), age-weighted citation indicators such 
as AWCR, AWCRpA (13] and AW-index have been recently advanced. Concerning 
point (Hi), a number of works try to assess the impact of the research field and its 
citation habits on the evaluation of the h-index of authors ll4l l2ll24lfT4ll and journals 

GQim. 

The last point, related to the effect of self-citations on the computation of h-index 
(and variants) is the core of this paper First of all, let us formally define the concept 
of self-citation as follows: 
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Definition 2 A self-citation Cp^^^ is any citation appearing in a paper p pointing to 
paper q, whose set of authors are respectively A [p] and A[^], for which it holds true: 
A\p\ C^A[q] ^ 0, i.e., the intersection of authors' sets is not empty. 

The quantitative analysis of the phenomenon of self-citations acquired relevant 
attention only during the latest years [3|. In fact, differently from what stated by 
Hirsch 111 2 1 in the original h-index paper, it has been recently proved that the effect of 
self-citations on the h-index is relevant and introduces a bias in the evaluation system 
[10 23 1 . Some authors stated that the h-index seems to be quite robust, since in some 
fields self-citations are less likely to occur |9|. Differently, in some other cases the 
h-index appear to be heavily biased by self-citations I ZTlfTTIl . 

During the investigation of this common behavior, a number of different reason- 
able motivations have been advanced, such that: ( i) the phenomenon of self-citation is 
physiological in "niche" fields, in which only a small number of authors contributes; 
(ii) it is a natural effect also in "hot topics", trending scientific fields where there 
is a tight core of publications which tend to be highly-cited by authors working in 
these fields, i.e., the same who have written these publications |18|; (Hi) it is com- 
mon in those fields in which the scientific contributions represent small increments 
of previous existing theories (e.g., physics, biology), where most authors are com- 
pelled to cite their previous work to avoid reproducing the same material in further 
publications. 

On the other hand, a number of additional reasons, unlikely to be shared, hold 
such as: (i) lack of attitude to bibliographical research, replaced by citing own pre- 
vious works that describe underlying ideas even if are not appropriate to the context 
of the current work |18|; (ii) due to the current state of the research which requires 
indicators to evaluate productivity and impact of scientist, it is disreputable that some 
scientists inflate their own indexes by means of voluntary self-citations; (Hi) consid- 
ering the context of application of h-index to journals, is a deprecable yet apparently 
common practice the coercive citation |26|, aimed at inflating the impact factor and 
the h-index of some journals of doubtful prestige and ethics. 

Regardless the motivations underlying the self-citation phenomenon, and the de- 
bate in the scientific community concerning its extent, most authors agree that a ro- 
bust indicator should take into account self-citations. Interestingly, rather than propos- 
ing novel metrics, the few attempts to deal with self-citations all tend to filter them 
out in the global computing of the h-index |i2n [T8l . 

This lack of solutions in the current literature is filled by this work, which repre- 
sent an attempt to propose a novel metric, called V-index, that tries to better reflect the 
impact of academics (authors and journals as well) taking into account their tendency 
to abuse of self-citations. 

The rest of the paper is organized as follows: section|2]presents the ideas underly- 



ing our proposal (see section 2.1 1 and, in addition, all the technical details including 



the mathematical formulation (section 2.2 1, its geometric interpretation (section 2.3 1 



and some further possible extensions (section 2.4 1. 

In section |3] we present the adoption of the V-index in three real-world contexts, 
i.e. (i) the assessment of the impact of the most prolific Computer Science authors 
(according to the DBLP ranking) as of 2011, the evaluation of the impact belong- 
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ing to the field "Computer Science Applications" of SCImago (to which also this 
journal belongs), and (Hi) the research impact assessment for year 201 1 of 130 coun- 
tries all over the world. Complementary to the quantitative evaluation, a qualitative 
analysis is carried out in order to clarify which insights and information are obtained 
by means of the adoption of V-index in addition to the original h-index. 

Finally, in section |4] the paper concludes summarizing our contribution and de- 
picting some future potential extensions. 

2 Measuring Virtuosity of Academics 

2. 1 Design Goals 

Before providing a formal description of our novel index, we illustrate the main ideas 
behind it. We start from a real-life example and we use it to derive some "require- 
ments" the metric should satisfy. 

Let an academic committee be organized to award a given fellowship, an appoint- 
ment or a career promotion (or, alternatively, to assess the impact of some journals 
belonging to a given field) - both these opportunities commonly arise in the academic 
context. In this context, it is reasonable to assume that a set of potential candidates has 
been previously selected and the committee assignment is to appoint one (or a few) of 
these academics with the given award. In the ideal case, the assessment of the merits 
should be as less unbiased as possible, such that all candidates are equally evaluated 
with respect to their scientific achievements. A part of the contribution to their evalu- 
ation will be probably accounted for their publication records and the relative impact 
of these works in the scientific community. 

The committee would probably exploit one of the existing standard bibliograph- 
ical indicators so that a quantitative assessment of the impact of the publications is 
obtained with few effort. In that case, the indicator that could be taken into account 
could be the h-index, given all the desirable features that it provides (summarized 
in the previous section). On the other hand, it is well-know that the h-index is not 
robust enough taking into account self-citations. Hence, it would be more appropri- 
ate to adopt a similar indicator that, preserving the desirable h-index features, would 
weight self-citations penalizing (or, at least, highlighting for further investigations) 
those authors who apparently abused of the possibility of self-citing their work. 

Summing up, we guess that this novel metric should satisfy the following require- 
ments: 

Requirement 1 - Relatedness to h-index. Since the h-index is proved to reflect 
some highly-desirable features, it represents an invaluable tool to assess the impact of 
academics, and this is essentially testified by its wide adoption. The proposed metric 
should relate to h-index, providing a reasonable formulation capable of maintaining 
the original desirable properties of the h-index, but including the ability to deal with 
self-citations. 

Requirement 2 - Reflect the virtuosity of authors. The phenomenon of self-citations 
has several interpretations. Some of them are reasonable or even physiological. Some 
others are deprecable and disreputable. On the other hand, there is currently a lack of 
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bibliographical indicators capable of dealing with self-citations (unless, we consider 
the option of filtering out self-citations as reasonable, that in some contexts could 
be a viable solution). This implies that our proposed metric should take into account 
the number of self-citations in order to give a better estimation of the virtuosity of 
an author. This metric should be able to highlight those authors that, for any reason 
(which could be further investigated), produced a relevant amount of self-citations. 

Requirement 3 - Extensibility to journals. One of the desirable features of the h- 
index is its extendibility to assessing the impact of journals |20|. The proposed metric 
should extend the ability of the original h-index of evaluating the impact of journals 
taking into account the number of self-citations provided by papers published by a 
given journals to those previously published by the same journal. Since it has been 
recently put into evidence the spread of the deprecable phenomenon of coercive cita- 
tions 1 26 1, the ability of highlighting journals whose number of self-citations heavily 
inflates their impact, appears now a necessity - at least to enable committees for 
further, more qualitative investigations. 

Requirement 4 - Simple computation. Not only the metric should be related to 
the h-index but also be a function of it, making possible its computation from the 
original h-index value without the need of the whole citation data. In fact, several ci- 
tation database services (e.g., Thomson ISI, SCImago, Scopus, etc.) already provide 
an overall count of self-citations per author or per journal, but is rather more diffi- 
cult to analyze publication per publication how these self-citations are distributed. 
For large scale evaluations, such as for example the research assessment of scholars 
affiliated to a department, it could be even unfeasible (in terms of human efforts) to 
filter out all self-citations to each publication for each evaluated author. Similarly, in 
the quest of assessing the impact of journals belonging to a given field, it could be 
extremely time consuming the process of filtering out the self-citations considering 
each paper published by each journal involved in the evaluation. 

Summarizing, it clearly emerges the necessity for a metric which takes into ac- 
count the overall number of self-citations per author/journal, in order to assess the 
impact of self-citations on the evaluation system. 

In the following, we present the V-index which represents our attempt to address 
these requirements. 



2.2 V-index 

Prior to formally define the V-index, in the following we introduce the ratio between 
the number of genuine citations (i.e., non-self-citations) and overall citations, hence- 
forth called virtuosity rate and denoted as Vrate 

^ _^ Self-citations ^ SC C-SC ^ 

Overall Citations C C 

It is worth to note that the virtuosity rate can be applied to authors as well as to 
journal^ At this point, it is possible to define the V-index as follows: 

' In the case of journals, the definition of self-citation provided above (see Definition jij must be 
an'anged according to the fact that a journal self-citation is considered as any citation from a paper to 
another one published in the same venue. 
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Definition 3 The V-index (denoted by ly) it is defined as the square root of the prod- 
uct between the h-index squared and the virtuosity rate 

h^-iC-SC) 



/v-y^^M^;:-^^^^^^, (2) 

where h represents the h-index, C the overall number of citations and SC the number 
of self-citations. 

Given the previous definitions, for further analysis we here remind the standard 
definition of a useful indicator know as citation per publication coefficient, Cp, that 
represents the ratio between the overall number of citations and the number of citable 
documents (of a given author or journal, indifferently), defined as 

^ Overall Citations C 
Citable Documents CD ' 

where CD is defined as the number of citable documents, and C has the same meaning 
as in equation ([TJ. In the light of the definition of virtuosity rate, we hereby can 
define the adjusted citation per publication coefficient Vp simply as the citation per 
publication coefficient computed with respect to the virtuosity rate, formalized as 
follows 

C / SC\ C C SC C~SC 

Vp-CpVra,e = -^[l--^j--^ —^--^. (4) 

The defined index will now be theoretically justified in the following subsection. 



2.3 Derivation 

An intuitive geometric derivation of this index is shown in Figure 1 . In that figure, 
curve g shows the number of citations per article where the publications have been 
sorted in decreasing order or citations. Indeed, curve / gives the same value but with- 
out considering self-citations. 

Obviously, the area under g is C, and the one under / is C — SC (whose values are 
known). Therefore, the ratio between the areas of / and g is just the virtuosity rate, 
i.e., Vrate, Stated in Equation[5] 

area under /■ C — SC 

^ = —p:r- = Vrate ■ (5) 

area under g C 

If we denote by h the observed h-index using all the citations C, and define the 
virtuosity index, ly as the expected h-index without self-citations (whose value is 
unknown), we could derive the latter assuming that the ratio between the areas of the 
squares with lengths h and ly present the same proportion than their corresponding 
citation curves. Equation (|6]l summarizes the assumption: 

72 

^=Vra,e- (6) 
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Fig. 1 Citation curves and tlieir corresponding li-indexes 

Obviously, solving for /y, the previous equation leads to our definition of the 



Another interpretation of this index is given in the framework of Lotkaian systems 
||8]. As stated in Q, the variation of the h-index, from /lo/^/ to /i„j.n , considering the 
removal of a fraction of k sources is computed as h„eu: — hoid ■ \/l—k, which exactly 
matches our definition of V-index, where the sources removed are precisely the set of 
self-citations. 



2.4 Possible extensions 

The V-index, defined in section |2.2| can be seen as a replacement of the original 
h-index where a scaling factor, \JYrute^ has been introduced as a correction. This 
correction is a real-valued weight between and 1, computed from the self-citations 
ratio, and is designed to mitigate their effect over the original index. V-index can 
also be seen as a reasonable approximation to the h-index without self-citations (see 
section [23] for that derivation). If we forget this theoretical derivation of this index, 
observing it in this form 



we could consider a family of general V-indexes, one for each different values of the 
function /. Thus, the "canonic" version would be the one where / is the square root 
of Yrate- Obviously not all values of / are reasonable. In fact, /, defined from [0, 1] to 
[0, 1] should have the following desirable properties: 

1. / should be a monotonically increasing function (i.e., given two values xi < X2, 
f{xi) < fixi). This is easily verifiable if /'(x) > 0,Vx G [0, 1]. 

2. /(I) = 1, meaning that there is no discount if no self-citations are received. 

The first property is needed if we want to penalize more the effect of having a 
greater proportion of self-citations (the opposite would be counterintuitive). 



V-index given in equation (j2]). 



Iv=f{Vrate)h, 



(7) 
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Given those two properties, it can be seen that the (uncorrected) h-index is gen- 
eralized by this genera V-index, choosing f{x) = 1 (i.e., self-citations are not consid- 
ered). Other possible values could be proposed for /, falling mainly into two main 
categorie^ (i) concave functions, for instance f{x) ^x^l'\n > 1; and (ii) convex 
functions, like /(x) —x",n > 1. 

A convex function like f{x) — x^ would highly penalize the occurrence of self- 
citations, having only values close to 1 if we move very near 1 (no self-citations 
needed). For instance, having a Vrate of 0.8, 0.8^ = 0.512, which means that we pe- 
nalize self-citations in a very strong way. 

The set of concave functions, on the other hand, would allow a certain proportion 
of self-citations, having a negligible effect if this ratio is high (for example, for a Vrate 
of 0.8, a/oTS — 0.894, leaving the original h-index at 90% of its original value. 

In fact, for the family of convex functions, it is known that f{x) < x, making the 
penalizing weight lower that the original ration (and equal, only in the case that .it 1). 
For concave functions the effect is the opposite, being f{x) > x. Both concavity and 
convexity can be stated if /"(jc) >Oor/"(x) <Oin all the domain (the interval [0,1]). 

We think that the use of concave functions is more suitable for the problem of 
down- weighting self-citations. In fact, in any scientific environment, self-citations 
tend to occur often in a natural way for different reasons ifTSl (the usual case is that 
the scientific work of a group or a person is built as an extension or improvement of 
some previous works). 



3 Experimentation 

In this section we illustrate three relevant fields of applications of our novel metric, 
which are, respectively, ( /) the evaluation of the impact of academic authors (section 



3.1 1, the assessment of the virtuosity of academic journals (section 3.2 1 and, fi- 
nally, the establishment of the impact of the research of countries (section 3.3 1. 
In addition, we provide analytical evaluation of the correlation between results and 
ranking provided by the adoption of our measure against the original h-index, and the 
/i* -index, i.e., the h-index after filtering out the self-citations, publication per publi- 
cation. 



3.1 Case 1 - Authors impact evaluation 

One of the most common assignments in academia is the evaluation of research im- 
pact of a set of authors. The motivations are potentially different, as previously dis- 
cussed, but a common feature characterizes this task: each author provides a set of 
scientific publications which should be evaluated in a possibly unbiased way. We be- 
lieve that the V-index represents an indicator of the impact and virtuosity for authors, 

- In fact, two additional comments should be added. First, there is a function (f{x) = 1) which is both 
convex and concave at the same time. Also, there could be functions which change their curvature across 
the domain (i.e., they change from convex to concave or vice versa one or more times across [0, 1]). We do 
not find any rationale for the use of this last kind of functions. 
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measuring better her/his achievements. To show that in this section we show the anal- 
ysis of the most prolific Computer Science authors (according to the DBLP ranking, 
as of 201 1). By means of this evaluation we will be able to prove that the V-index is 
a correct and unbiased approximation of the /;* -index (which is the h-index that we 
would obtaining filtering out all the self-citations, publication per publication). 

To carry out this experiment we consulted the "most prolific" Computer Scien- 
tists list provided by DBLF|^ as of 20 11. Then, for each author appearing in the list, 
we manually retrieved her/his list of publications, matching her/his profile in Sco- 
pus. Together with this list, we thus obtained the total number of citations, and the 
number of self-citations to all her/his papers, the h-index and the /z*-index, i.e., the 
indicator of the impact of each author with and without considering the self-citations. 
Retrieved this set of information, we are able to compute our virtuosity rate and the 
corresponding V-index for each of considered authors. Results related to the top 25 
authors are reported in Table [T] 

In detail, we report: the total number of citable documents (CD) and the relative 
position in the ranking according to this criterium; the number of citations (C), self- 
citations (SC) and the citations per publication coefficient Cp\ the h-index and the 
position according to this criterium, together with the /i* -index; finally, the virtuosity 
rate Vi-ate, the adjusted citation per publication coefficient Vp and the V-index, to- 
gether with the ranking produced according to this criterium, and the ratio between 

the V-index and the h-index, defined as r = V-index ^ ix_ 

h-mdex " 

For sake of simplicity, for this first experiment we discuss the results splitting 
our analysis in two parts. First, we consider what emerges from the evaluation just 
reviewing the standard indicators provided by the current literature (e.g., number of 
publications, citations per publications coefficient, h-index, /!*-index). In the second 
part of our analysis, we discuss the contribution provided by our novel indicator. 



The first interesting aspect that comes out from the analysis of Table 3.1 is that 
single criteria (e.g., number of publications, number of citations) are not reliable in- 
dicators of the overall impact of each author It is clearly shown that the ranking 
according to the total number of publications strongly differ from any other rank- 
ing based on multiple criteria (e.g., h-index, V-index). Apparently, also the citation 
per publication coefficient seems not to be very effective in the representation of the 
impact of considered scholars; in addition, it is evident that this indicator is linearly 
dependent of the number of self-citations, thus its bias is more marked than that of 
h-index or /i*-index. Indeed, the most interesting insights come from the analysis of 
these two indicators. For all considered authors, there is a drop in the h-index, after 
filtering out the self-citations, which ranges from negligible values (w 3%) to consid- 
erable values (up to « 23%). The average and median drop in h-index removing the 
self citations are respectively « 10% and w 9%, with a standard deviation a = 5.7%. 
Another interesting consideration is that the standard deviation of the h-index distri- 
bution is (7 = 8.1 12, smaller than that of the /;* -index which is equal to a = 8.52. This 
means that h-index is susceptible to fluctuations due to the presence of self-citations. 



^ The list is located in the following URL: http://www.informatik.uiii-tri6r.de/-ley/db/ 
|indices/a-tree/prolif ic/| 
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A 1 ithrM^ 


CD 


(pos.) 


C 


SC 




h-index (pos.) h* 






V-index (pos.) Ratio 


Huang, Thomas 


784 


7 


8,956 


650 


11.423 


44 


2 


43 


0.92710.594 42.373 


1 


0.963 


Wang, Wei 


1,183 


4 


10,723 1,558 


9.064 


44 


1 


41 


0.855 7.747 


40.678 


2 


0.925 


Yu, Philip 


392 


20 


5,348 


305 


13.643 


41 


3 


40 


0.94312.865 


39.814 


3 


0.971 


Poor, H. Vincent 


784 


8 


8,501 


955 


10.843 


39 


5 


38 


0.888 9.625 


36.744 


4 


0.942 


Han, Jiawei 


302 


24 


5,985 


299 


19.818 


37 


6 


36 


0.95018.828 36.064 


5 


0.975 


Li, Ming 


1,542 


2 


11,049 1,946 


7.165 


39 


4 


34 


0.824 5.903 


35.399 


6 


0.908 


Shin, Kang 


541 


13 


4,898 


248 


9.054 


35 


8 


34 


0.949 8.595 


34.102 


7 


0.974 


Li, Xin 


1,568 


1 


8,591 


1,648 


5.479 


36 


7 


33 


0.808 4.428 


32.363 


8 


0.899 


Seidel, Hans Peter 


351 


23 


4,427 


603 


12.613 


31 


11 


29 


0.86410.895 


28.812 


9 


0.929 


Sangiovanni-Vincentelli, Alberto 572 


12 


3,239 


398 


5.663 


30 


13 


28 


0.877 4.967 


28.096 


10 


0.937 


Chang, ChinChen 


704 


10 


4,300 


794 


6.108 


31 


10 


28 


0.815 4.980 


27.992 


11 


0.903 


Bertino, Elisa 


372 


21 


3,296 


353 


8.860 


29 


14 


28 


0.893 7.911 


27.403 


12 


0.945 


Pediycz, Witold 


752 


9 


4,643 


1,262 


6.174 


32 


9 


28 


0.728 4.496 


27.307 


13 


0.853 


Wang, Jun 


1,038 


5 


4,892 


971 


4.713 


30 


12 


27 


0.802 3.777 


26.858 


14 


0.895 


Gao, Wen 


887 


6 


3,901 


509 


4.398 


26 


15 


24 


0.870 3.824 


24.245 


15 


0.932 


Kandemir, Mahmut Taylan 


424 


18 


2,350 


275 


5.542 


23 


20 


21 


0.883 4.894 


21.612 


16 


0.940 


Abraham, Ajith 


368 


22 


2,239 


426 


6.084 


24 


16 


21 


0.810 4.927 


21.596 


17 


0.900 


Zhang, Yan 


607 


11 


2,237 


338 


3.685 


23 


17 


21 


0.849 3.129 


21.191 


18 


0.921 


Wu, Jie 


463 


16 


2,189 


387 


4.728 


23 


18 


21 


0.823 3.892 


20.868 


19 


0.907 


Liu, Yang 


1,213 


3 


3,712 


788 


3.060 


22 


21 


20 


0.788 2.411 


19.526 


20 


0.888 


Hancock, Edwin R. 


462 


17 


2,337 


677 


5.058 


23 


19 


19 


0.710 3.593 


19.384 


21 


0.843 


Reddy, Sudhakar M. 


529 


15 


2,120 


616 


4.008 


22 


22 


17 


0.709 2.843 


18.530 


22 


0.842 


Liu, Wei 


532 


14 


1,825 


493 


3.430 


19 


23 


15 


0.730 2.504 


16.232 


23 


0.854 


Rozenberg, Grzegorz 


288 


25 


864 


183 


3.000 


17 


24 


15 


0.788 2.365 


15.093 


24 


0.888 


Piattini, Mario 


394 


19 


1,182 


423 


3.000 


16 


25 


13 


0.642 1.926 


12.821 


25 


0.801 



Table 1 Top 25 "Most Prolific Computer Science" authors, reporting citable documents (CD), citations 
(C), self-citations (SC). 



In the light of previous considerations, we can analyze the contribution provided 
by our indicator First of all, we study the virtuosity rate of considered authors. The 
Vrate distribution ranges from a minimum of ss 64% (which means that more than one 
third of overall citations are indeed self-citations) to a maximum of w 95%. The av- 
erage and the median virtuosity rates approximately coincide, equal to w 82%, with a 
standard deviation of « 8%. This means that in average each scholar self-cites her/his 
own works for an amount of more or less 20% of overall citations, and provides the 
remaining 80% of citations to alien works. This is an interesting finding since it re- 
calls the famous 80-20 Pareto principle |17|. Considering the adjusted citation per 
publication coefficient, it emerges clearly that self-citations largely affect this indi- 
cator, in fact the average drop with respect to the non-adjusted coefficient is « 17% 
(with a standard deviation of w 8%), ranging from a minimum of ss 5% to a maximum 
of ss 35%. These results support the concerns raised by the scientific community on 
the validity and robustness of this indicator to assess the impact of academics. 

Thus, we consider the V-index and the ranking produced by our indicator. First of 
all, we notice that the ranking produced by V-index is slightly different with respect 
to that produced by the h-index. On the other hand, the ranking produced by V-index 
is absolutely the same of that produced if we would rank authors with respect to the 
/i* -index, i.e., by filtering out the self-citations. 

Interestingly, to produce the V-index we only require the overall number of self- 
citations, while to produce the /z* -index we must be aware of the distribution of self- 
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citations among the whole set of publications. Since this information in seldom avail- 
able, and its acquisition is time consuming, we argue that the utility of the V-index 
would be very high if we could prove that the distribution of values produced is cor- 
related to that produced by the h*-mdex. 

To do so, we measure the correlation between these two distributions by means 
of the Pearson correlation coefficient, defined as 

_ coviX,Y) 

whose results lie in the interval [—1, +1]. The higher the value of px.y, the better two 
distributions are correlated. We recall that the Pearson correlation coefficient tells us 
whether the two distributions X and Y are deterministically related or not. 

Obtained results are quite thrilling: the correlation coefficient computed between 
the V-index distribution and the /z* -index (i.e., the h-index obtained filtering out the 
self-citations, publication by publication) is equal to P/^./,* = 0.997. It emerges an ex- 
tremely strong correlation between the two distributions, testified by a ;?-value report- 
ing a significance level of p = 2.33 x 10^^^ (therefore showing a strong significance 
of the previous correlation value). 

We conclude this analysis reporting that the standard deviation of the V-index dis- 
tribution is (J = 8.25, lower than that of /;* -index and that the average ratio between 
V-index and h-index is « 91%, with a standard deviation of « 4.4%. 

In the quest of avoiding discussing the per-case author, it is worthy to note that, 
even if there are not big distortions in the ranking, some authors advanced of few 
positions thanks to their virtuous behaviors, while some authors lose some positions 
being their virtuosity rate much smaller than direct competitors. The picture provided 
by the V-index leaves space for further qualitative investigations on the motivations 
of such tendency to self-cite or not own works, and it comes at no cost. 



3.2 Case 2 - Journal impact assessment 

The second example provided in this work is based on the evaluation of virtuosity of 
scientific journals. To this purpose, we selected the set of journals belonging to the 
category "Computer Science Applications" of the SCImago Journal Ranking [22 1 - 
to which also this journal belongs. This category, as 201 1, contains 194 journals with 
heterogeneous and broad topics {e.g., bioinformatics, computational methods applied 
to different fields, information systems, bibliometrics, and much more). Each journal 
is provided with a series of statistics, summarized in Table[2]for the top 25 journals, 
such as the total number of citable documents (CD) - and the relative ranking; the 
number of citations (C), self-citations (SC) and the citations per publication coeffi- 
cient Cp; the h-index - and the relative ranking. We report in addition all statistics 
related to the Vrate, Vp and V-index - and the relative ranking -, and the ratio between 
the V-index and the h-index. 

The first important difference that emerges with respect of the previous experi- 
ment is that there is not any reference to the /i* -index. In fact, in this case there is 
no actual knowledge on the distribution of the citations among papers published by 
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Journal CD (pos.) C SC Cp h-index (pos.) V„,j Vp V-index (pos.) Ratio 



Bioinformatics 


2,104 


3 


6,510 


567 


2.93 


176 


1 


0.913 


2.675 


168.161 


1 


0.955 


IEEE T Med Imaging 


503 


20 


1,229 


107 


2.24 


116 


3 


0.913 


2.045 


110.835 


2 


0.955 


J Comput Pliys 


1,387 


4 


2,595 


622 


1.62 


116 


2 


0.760 


1.232 


101.147 


3 


0.872 


Comput Method AppI M 


868 


8 


1,626 


300 


1.74 


92 


4 


0.815 


1.419 


83.081 


4 


0.903 


J Chem Inf Model 


697 


11 


1,839 


286 


2.65 


90 


5 


0.844 


2.238 


82.706 


5 


0.919 


BMC Bioinformatics 


2,192 


2 


3,881 


335 


1.52 


78 


6 


0.914 


1.389 


74.558 


6 


0.956 


Comput Phys Commun 


721 


10 


1,151 


109 


1.68 


78 


7 


0.905 


1.521 


74.215 


7 


0.951 


IEEE T Comput Aid D 


670 


13 


568 


73 


0.63 


70 


8 


0.871 


0.549 


65.347 


8 


0.934 


Comput Chem Eng 


610 


14 


1,089 


160 


1.7 


68 


9 


0.853 


1.450 


62.806 


9 


0.924 


Med Image Anal 


190 


59 


582 


44 


2.71 


60 


11 


0.924 


2.505 


57.687 


10 


0.961 


Comput Struct 


461 


24 


667 


61 


1.3 


54 


14 


0.909 


1.181 


51.472 


11 


0.953 


Int J Numer Meth Fl 


675 


12 


587 


76 


0.74 


55 


13 


0.871 


0.644 


51.316 


12 


0.933 


Inform Process Manag 


222 


50 


277 


12 


0.91 


51 


16 


0.957 


0.871 


49.883 


13 


0.978 


lET Control Theory Appl 


489 


21 


411 


51 


0.66 


53 


15 


0.876 


0.578 


49.603 


14 


0.936 


Inform Sciences 


1,008 


5 


2,595 


921 


2.32 


61 


10 


0.645 


1.497 


48.994 


15 


0.803 


IEEE T Syst Man Cy C 


175 


62 


328 


17 


1.39 


49 


19 


0.948 


1.318 


47.713 


16 


0.974 


J Chem Theory Comput 


877 


7 


2,880 


380 


2.85 


50 


17 


0.868 


2.474 


46.585 


17 


0.932 


Expert Syst Appl 


2,773 


1 


6,064 


2290 


2.08 


57 


12 


0.622 


1.295 


44.967 


18 


0.789 


Med Biol Eng Comput 


383 


30 


446 


71 


1.14 


48 


20 


0.841 


0.959 


44.014 


19 


0.917 


Cyberpsychol Behav Soc 


Netw 336 


38 


510 


53 


1.32 


46 


22 


0.896 


1.183 


43.544 


20 


0.947 


IEEE T Inf TechnolB 


373 


31 


440 


40 


1.04 


45 


26 


0.909 


0.945 


42.906 


21 


0.953 


Struct Multidiscip O 


408 


28 


395 


82 


0.98 


47 


21 


0.792 


0.777 


41.838 


22 


0.890 


Comput Ind 


205 


54 


338 


59 


1.29 


46 


24 


0.825 


1.065 


41.793 


23 


0.909 


J Syst Software 


536 


19 


583 


48 


0.85 


43 


27 


0.918 


0.780 


41.192 


24 


0.958 


J Parallel Distr Com 


302 


44 


289 


16 


0.75 


41 


29 


0.945 


0.708 


39.849 


25 


0.972 



Table 2 Top 25 "Computer Science Applications" journals, reporting citable documents (CD), citations 
(C), self-citations (SC). Journal names are abbreviated using the standard ISl nomenclature. 



each journal, and indeed there is not a practical way to infer the /z* -index, i.e., the 
h-index obtained after filtering out the self-citations to each paper published by any 
given journal. 

In this context, the power and the utility of the proposed indicator is clear: we 
can produce a statistically significant prediction of what would be the h-index of 
each journal if we would filter out the self-citations (of each published paper by all 
journals), by means of our V-index. 

The assessment is straightforward: single criteria indicators fail to reflect journals 
impact (note the fluctuations in the ranking proposed by the number of published 
papers, or the overall citations). Similar considerations hold in the case of the citation 
per publications coefficient - some journals with high values lie in the middle of the 
ranking according to h-index based indicators. 

In this case, we highlight that the ranking produced by the V-index is in agreement 
with that produced by the h-index: a small number of papers gain or lose few positions 
in the ranking, according to their heightened or lessen tendency to self-citations. 

In practical contexts, the V-index can be adopted as a simple and efficient tool to 
identify scientific venues that are potentially involved in the deprecable practice of 
coercive citations Il26l . with the aim of inflating their impact factor and h-index. 

Being our V-index a statistically significant prediction of the /z* -index we can 
proceed as in the previous case providing some statistics regarding the virtuosity of 
the considered journals. Similarly to the assessment in the case of authors, regarding 
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journals there exists a drop in the h-index with respect to the V-index that ranges from 
a minimum of w 2.5% to a (potentially alarming) maximum of w 22%. Even though, 
in case of journals the average and median drop are respectively « 7.3% and « 6.4% 
with a standard deviation of a = 4.7% - lower than that of authors. 

Regarding the virtuosity rate, the distribution ranges from a (suspicious) min- 
imum ss 62% (which means that more tha one third of overall citations are self- 
citations) to a (highly fair) maximum of « 96%, with an average and a median of 
respectively w 86% and w 87%, and a standard deviation a = 8.4%. In addition, the 
average ratio between the V-index and the h-index is « 92% with a standard deviation 
of « 4.7%. This means that, in spite self-citations alter the h-index and in some case 
also the ranking of journals, the "average journal" belonging to the category "Com- 
puter Science Application" is reasonably not used to the practice of self-citations. It 
would be of extreme interest to track these habits across different scientific fields (as 
briefly discussed in the conclusive section). 

Concluding, it is worth noting that, since the V-index is computed as a function 
of the h-index and a convex function of V rate, computing a correlation between them 
in this case is not meaningful. 

3.3 Case 3 - Country research assessment 

The last case study included in this work is a large scale assessment of research 
impact for 130 countries all over the world. Data summarized in Table [3] for the top 
25 countries, report the same statistics of the previous case. 

It immediately emerges that, also in this case, there is no clue regarding the distri- 
bution of self-citations - only the overall count is provided. More importantly, in this 
case it would be extremely complex, without having complete access to a database 
such as Thomson ISI, or Scopus, to obtain, for each country, the list of authors and 
relative publications, in order to extract the distribution of self-citations - without 
even taking into consideration the amount of time required for this operation, even if 
it would be technically possible. 

Interestingly, the V-index is capable of providing additional insights related to 
habits and virtuosity of countries, at the largest scale perspective, and this possibility 
comes at no cost. 

Considering data reported in Table [3] it emerges that in this case single crite- 
ria seem slightly more reliable. For example, countries with a large research output, 
represented by numerous publications, tend to have h-index based indicators on av- 
erage higher - on the other hand there are some interesting outliers (such as Panama, 
Uganda or Estonia, that have high indexes although the research output, intuitively, 
is lower). 

Taking into account the contribution provided by virtuosity rate and V-index, it is 
worth to note that some countries appear to be more affected than others. In general, 
a trend correlating low Vrate to large research output can be envisaged (some forms of 
correlation could be investigated, as discussed in future works). On the other hand, 
the ranking produced by V-index slightly shuffle that obtained by using the standard 
h-index. In fact, some countries gain some positions in the ranking thanks to their 
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Country 


CD 


(pos.) 


C 


SC 


Cp 


h-index (pos.) 




V IJiLlt A 


(pos.) Ratio 


United States 


4,972,679 


1 


100,496,612 46,657,626 20.18 


1.229 


1 


0.563 10.81 1 


899.549 


1 


0.732 


United Kingdom 1,392,982 


4 


24,535,306 


5,911,758 


17.42 


750 


2 


0.579 13.223 


653.426 


2 


0.871 


Germany 


1,321,606 


5 


20,437,971 


5,412,521 


15.79 


657 


3 


0.735 11.608 


563.327 


3 


0.857 


France 


964,320 


6 


14,156,535 


3,310,129 


15.09 


568 


5 


0.766 11.562 


497.179 


4 


0.875 


Japan 


1,429,881 


3 


16,452,234 


4,953,600 


11.72 


580 


4 


0.699 8.191 


484.885 


5 


0.836 


Canada 


748,787 


7 


12,187,113 


2,406,404 


17.55 


515 


6 


0.803 14.085 


461.362 


6 


0.896 


Netherlands 


409,982 


14 


7,805,760 


1,342,441 


20.05 


506 


8 


0.828 16.602 


460.438 


7 


0.910 


Australia 


485,249 


11 


7,083,995 


1,532,649 


16 


509 


7 


0.784 12.538 


450.586 


8 


0.885 


Italy 


720,911 


8 


9,861,600 


2,316,810 


14.45 


450 


9 


0.765 11.055 


393.607 


9 


0.875 


Spain 


547,858 


9 


6,573,014 


1,692,724 


13.12 


448 


10 


0.742 9.741 


386.028 


10 


0.862 


Sweden 


292,150 


18 


5,410,618 


905,907 


19.09 


412 


11 


0.833 15.894 


375.930 


11 


0.912 


Switzerland 


292,254 


17 


6,007,936 


848,894 


21.77 


368 


14 


0.859 18.694 


341.012 


12 


0.927 


Panama 


2,526 


106 


55,507 


6,047 


27.86 


330 


16 


0.891 24.825 


311.507 


13 


0.944 


Denmark 


154,612 


24 


3,015,221 


452,805 


20.42 


336 


15 


0.850 17.353 


309.745 


14 


0.922 


Brazil 


318,294 


15 


2,409,214 


783,003 


9.57 


373 


13 


0.675 6.460 


306.450 


15 


0.822 


Iran 


117,469 


30 


499,322 


204,982 


7.68 


398 


12 


0.58y 4.527 


305.575 


16 


0.768 


Finland 


149,390 


25 


2,447,743 


415,216 


17.64 


288 


18 


0.830 14.648 


262.439 


17 


0.911 


Norway 


116,973 


31 


1,749,741 


294,571 


16.63 


287 


19 


0.832 13.830 


261.729 


18 


0.912 


Uganda 


4,948 


86 


62,314 


10,522 


15.73 


285 


20 


0.831 13.074 


259.826 


19 


0.912 


Belgium 


224,898 


20 


3,621,954 


555,562 


17.1 


262 


21 


0.847 14.477 


241.070 


20 


0.920 


Portugal 


96,937 


33 


960,473 


198,308 


12.14 


258 


22 


0.794 9.633 


229.827 


21 


0.891 


Estonia 


14,106 


62 


150,084 


29,699 


12.63 


256 


23 


0.802 10.131 


229.276 


22 


0.896 


Israel 


177,814 


22 


2,898,025 


433,162 


16.66 


247 


25 


0.851 14.170 


227.794 


23 


0.922 


China 


1,833,463 


2 


7,396,935 


3,937,424 


5.66 


316 


17 


0.468 2.647 


216.107 


24 


0.684 


Chile 


48,964 


44 


505,589 


98,339 


12.69 


234 


26 


0.805 10.222 


210.014 


25 


0.897 



Table 3 Top 25 country research assessment 2011, reporting citable documents (CD), citations (C), self- 
citations (SC). 



high values of Vrme that tends to reward them with better positions, hi general, the 
virtuosity rate values range from a minimum of w 46% to a maximum of « 89%, 
with an average of « 76.7% and a standard deviation of w 10.4% - these values are 
lower than both the previous cases. 

Concluding, the average ratio is « 87.4% with a standard deviation of « 6.3% 
(much worse than previous cases), that confirms that on a large scale, self-citations 
affect more deeply the evaluation system and that V-index represents an invaluable 
tool to assess the research impact from an empowered perspective. 



4 Conclusions 

In this paper presented the V-index, a new metric to assess the virtuosity of academics. 
The index is devised to take into account self-citations in the computation of the 
impact of research, rewarding those authors or journals with a lessen tendency to 
self-citations and penalizing those who abuse of this practice. 

The main strength of this indicator is that it does not require any prior knowledge 
on the distribution of self-citations among publications. We proved that, in absence of 
any information about self-citation distributions, our V-index represents a statistically 
significant prediction of the value that the h-index would acquire if we would be able 
to filter out self-citations from each publication. 
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We assessed the adoption of V-index in three practical use cases, i.e., evaluating 
authors, journals and country research impact, witnessing that the V-index represents 
an invaluable tool to highlight potentially anomalous situations (such as "niche" re- 
search fields, coercive citations, heightened tendency to self-citing and so on). 

As for future work, two relevant ongoing research lines are here described. The 
first is related to the analysis of networks of citations adopting the V-index. This 
would be done, in particular to the purpose of defining in different specific scientific 
fields the habits of authors and journals. The second is instead oriented to the inves- 
tigation of the possible existence of some forms of correlation between the V-index 
and other features, such as nationality of authors, size of the population of countries, 
scientific fields, etc. 
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